feat(swarm): replace address scoring with explicit candidates #3954

thomaseizinger · 2023-05-16T12:55:50Z

Description

Previously, a NetworkBehaviour could report an AddressScore for an external address. This score was a u32 and addresses would be ranked amongst those.

In reality, an address is either confirmed to be publicly reachable (via a protocol such as AutoNAT) or merely represents a candidate that might be an external address. In a way, addresses are guilty (private) until proven innocent (publicly reachable).

When a NetworkBehaviour reports an address candidate, we perform address translation on it to potentially correct for ephemeral ports of TCP. These candidates are then injected back into the NetworkBehaviour. Protocols such as AutoNAT can use these addresses as a source for probing their NAT status. Once confirmed, they can emit a ToSwarm::ExternalAddrConfirmed event which again will be passed to all NetworkBehaviours.

This simplified approach will allow us implement Kademlia's client-mode (#2032) without additional configuration options: As soon as an address is reported as publicly reachable, we can activate server-mode for that connection.

Related: #3877.
Related: #3953.
Related: #2032.
Related: libp2p/go-libp2p#2229.

Co-authored-by: Max Inden mail@max-inden.de

Notes & open questions

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

thomaseizinger · 2023-05-17T14:19:17Z

For kademlia to be useful, we only want publicly reachable addresses in our routing table. Thus, as a general rule, we will only ever add a remote's address to the routing table if:

we successfully dialed them
the remote advertises the kademlia protocol on this connection:

rust-libp2p/protocols/kad/src/behaviour.rs

Lines 2048 to 2059 in cc5b346

    
           KademliaHandlerEvent::ProtocolConfirmed { endpoint } => { 
        
               debug_assert!(self.connected_peers.contains(&source)); 
        
               // The remote's address can only be put into the routing table, 
        
               // and thus shared with other nodes, if the local node is the dialer, 
        
               // since the remote address on an inbound connection may be specific 
        
               // to that connection (e.g. typically the TCP port numbers). 
        
               let address = match endpoint { 
        
                   ConnectedPoint::Dialer { address, .. } => Some(address), 
        
                   ConnectedPoint::Listener { .. } => None, 
        
               }; 
        
               self.connection_updated(source, address, NodeStatus::Connected); 
        
           }

(2) is where the client/server mode implementation comes in. A node SHOULD automatically enable/disable advertising the kademlia protocol based on whether it is publicly reachable on an address. For example, the inbound connection could have been hole-punched, meaning the address isn't generally publicly reachable and thus we should not advertise the kademlia protocol there.

How do we figure that out? With protocol like AutoNAT and identify.

To start with, a node only knows about its listen addresses. We inject them into the NetworkBehaviour via FromSwarm::NewListenAddr.

As soon as a connection is established (regardless of inbound or outbound), we run the identify protocol. Identify reports us the address that the other peer observed for us. This is essentially STUN for libp2p.

For example, if we are listening on 192.168.0.13 and are sitting behind a router with the public IP 91.123.03.58 then the other node will report that as the address it is seeing the traffic from.

Using this PR, we can now emit this address via ToSwarm::NewExternalAddrCandidate and have protocols such as AutoNAT test whether we are reachable via this address.

Let's assume for a moment that we are and AutoNAT reports that we are indeed reachable via 91.123.03.58.

Here comes the tricky question: What local interface does this correspond to? If the user specifies a wildcard address for listening, we will start listening on all interfaces available, including loopback. Filtering for private and loopback addresses is not enough though, a machine could easily have multiple NICs and thus multiple different IP addresses.

Currently, we do not expose which local interface an outbound connection corresponds to:

rust-libp2p/swarm/src/behaviour.rs

Lines 178 to 190 in cc5b346

    
               /// Callback that is invoked for every established outbound connection. 
        
               /// 
        
               /// This is invoked once we have successfully dialed a peer. 
        
               /// At this point, we have verified their [`PeerId`] and we know, which particular [`Multiaddr`] succeeded in the dial. 
        
               /// In order to actually use this connection, this function must return a [`ConnectionHandler`](crate::ConnectionHandler). 
        
               /// Returning an error will immediately close the connection. 
        
               fn handle_established_outbound_connection( 
        
                   &mut self, 
        
                   _connection_id: ConnectionId, 
        
                   peer: PeerId, 
        
                   addr: &Multiaddr, 
        
                   role_override: Endpoint, 
        
               ) -> Result<THandler<Self>, ConnectionDenied>;

To go back to kademlia's client mode: The only thing we know there is which local interface a connection arrived on. To say with confidence that this interface is publicly reachable, we need a mapping of an external reachable address back to the local interface.

I don't think our AutoNAT implementation currently does this. I think what we'd need to implement in this PR is the following:

ToSwarm::NewExternalAddrCandidate as well as ToSwarm::ExternalAddrConfirmed need to carry a tuple of addresses: The address of the local interface and the candidate / confirmed external address.
To satisfy this interfaces, identify will need access to the local address of outbound connections (for inbound ones, we already have access to it). Initial testing with TCP showed that we can extract this information from the socket returned from accept.
For AutoNAT, we will also have to record the local addr of incoming connections and correlate it with the probe of an external address.

@mxinden Please let me know what you think and whether you see any hole in this reasoning. I believe this is the most correct way to identify external addresses but it will also require some effort in implementing. A shortcut we could take in the meantime is to assume we are publicly reachable on all interfaces if we have at least one externally reachable address.

mxinden · 2023-05-17T14:42:11Z

Here comes the tricky question: What local interface does this correspond to?

Why do we need to know what the corresponding local interface is?

thomaseizinger · 2023-05-17T15:17:43Z

Here comes the tricky question: What local interface does this correspond to?

Why do we need to know what the corresponding local interface is?

If the machine has multiple NICs and thus multiple local interfaces then the external address only corresponds to one of them, correct?

Say I have two local addresses:

An Ethernet connection that has: 10.0.0.15
A WiFi connection that has: 192.168.0.17

The Ethernet connection is only connected to a LAN whereas the WiFi connection is routable to the public internet with a forwarded port.

Say I am also running multiple libp2p nodes in my local network and thus have incoming connections on both interfaces.

It would be wrong to advertise kademlia on the Ethernet connection because it would cause those other nodes in my network to include this address in the routing table despite not being globally reachable.

I know this is a constructed example but people might have all sorts of weird network topologies. Fundamentally, I just think it might not be a good idea to assume any incoming connection is publicly routable just because we identified an external address for one of our connections?

mxinden · 2023-05-18T02:40:27Z

Thanks for expanding, that makes sense.

I suggest treating the multiple-interfaces use-case as an additional feature. I prefer to solve the single-interface use-case first. As far as I can tell the single-interface use-case is a step forward, even when the long term goal would be the multiple-interface use-case.

I am afraid that the multiple-interface use-case is a rabbit-hole which we are unable to solve properly without having a concrete use-case that we are optimizing for.

I just think it might not be a good idea to assume any incoming connection is publicly routable just because we identified an external address for one of our connections?

Meta level, I think what you are referring to here is "any listen address of an incoming connection is publicly routable". A connection itself is never routable, neither publicly nor privately within a subnet.

mxinden

Overall looks good to me. Next to making client-mode a lot easier, I find this to be a nice clean-up.

swarm/src/behaviour.rs

mxinden · 2023-05-18T02:50:18Z

swarm/src/behaviour.rs

+    /// - A protocol such as identify obtained it from a remote.
+    /// - The user provided it based on configuration.
+    /// - We made an educated guess based on one of our listen addresses.
+    /// - We established a new relay connection.


Do we have any mechanism to confirm a relayed connection? I am not aware of a libp2p-autonat implementation that would probe relayed addresses. That said, emitting them as NewExternalAddrCandidate makes sense, as we can not confirm them. How would libp2p-identify make sure to include relayed addresses when advertising the local node's listenAddrs (i.e. external addresses)? Would libp2p-identify ignore all FromSwarm::NewExternalAddr addresses except for the relayed ones?

I don't think so but the relay could (should?) emit an event that the relay address is now an external address of ours.

How would libp2p-identify make sure to include relayed addresses when advertising the local node's listenAddrs (i.e. external addresses)? Would libp2p-identify ignore all FromSwarm::NewExternalAddr addresses except for the relayed ones?

Why ignore them? A relayed address is just one more external address, isn't it? It is feasible to use relays to translate between IPv4 and IPv6 for example if node you want to connect to is only on IPv6 and you are only on IPv4.

Would libp2p-identify ignore all FromSwarm::NewExternalAddr addresses except for the relayed ones?

Why ignore them?

There seems to be a misunderstanding. I suggest libp2p-identify to track all addresses confirmed via ExternalAddrConfirmed. That would not include relayed addresses. Though one still needs to advertise ones relayed addresses via libp2p-identify. I don't think one should advertise all NewExternalAddrCandidate, but instead only the relayed NewExternalAddrCandidates.

Does that make sense?

I think we can treat a relayed address as externally reachable without verifying it. We did connect to the relay after all so it is kind of implicit that others will be able to connect to the relay too. Thus, I think the relay behaviour could just issue a ToSwarm::NewExternalAddr with the relayed address.

swarm/src/behaviour.rs

swarm/src/lib.rs

thomaseizinger · 2023-05-18T11:30:42Z

Thanks for expanding, that makes sense.

I suggest treating the multiple-interfaces use-case as an additional feature. I prefer to solve the single-interface use-case first. As far as I can tell the single-interface use-case is a step forward, even when the long term goal would be the multiple-interface use-case.

Sounds good to me. I'd like to add a log statement to kademlia then where we are making this assumption. Should make debugging easier in case someone runs into that issue.

I just think it might not be a good idea to assume any incoming connection is publicly routable just because we identified an external address for one of our connections?

Meta level, I think what you are referring to here is "any listen address of an incoming connection is publicly routable". A connection itself is never routable, neither publicly nor privately within a subnet.

Yep that makes more sense :D

Co-authored-by: Max Inden <mail@max-inden.de>

mxinden

Looks good to me apart from the above small suggestion.

We should thoroughly test this in combination with #3877. Easiest step I would guess is to deploy it to kademlia-exporter.max-inden.de/.

We no longer test external addresses.

thomaseizinger · 2023-05-22T08:30:24Z

@mxinden Let me know what you think of 3c38b3f. I didn't see any value in keeping that part of the test because we now have explicit candidates that AutoNAT operates on. I did consider directly flipping to NatStatus::Public on any "confirmed external address" but I am not sure that is a useful feature. I would expect users that use Swarm::add_external_address to not use AutoNAT.

mxinden · 2023-05-22T10:18:31Z

Additional commits look good to me. Only open discussion is #3954 (comment).

thomaseizinger · 2023-05-22T18:22:13Z

I had to delete two more AutoNAT tests that aren't easily fixable with how we treat external addresses now. If you have an idea on how to fix them, feel free to push a commit.

This reverts commit 62ec482.

thomaseizinger · 2023-05-23T10:35:34Z

@mxinden I restored the tests with the idea we discussed. The AutoNAT behaviour now has a function probe_address that can be used to directly probe an address for external reachability.

thomaseizinger · 2023-05-24T07:31:51Z

Additional commits look good to me. Only open discussion is #3954 (comment).

The comment is addressed and the deleted tests are restored, thus I am merging this based on the above approval.

> Observed addresses (aka. external address candidates) of the local node, reported by a remote node > via `libp2p-identify`, are no longer automatically considered confirmed external addresses, in > other words they are no longer trusted by default. Instead users need to confirm the reported > observed address either manually, or by using `libp2p-autonat`. In trusted environments users can > simply extract observed addresses from a `libp2p-identify::Event::Received { info: > libp2p_identify::Info { observed_addr }}` and confirm them via `Swarm::add_external_address`. Follow-up to libp2p#3954.

> Observed addresses (aka. external address candidates) of the local node, reported by a remote node > via `libp2p-identify`, are no longer automatically considered confirmed external addresses, in > other words they are no longer trusted by default. Instead users need to confirm the reported > observed address either manually, or by using `libp2p-autonat`. In trusted environments users can > simply extract observed addresses from a `libp2p-identify::Event::Received { info: > libp2p_identify::Info { observed_addr }}` and confirm them via `Swarm::add_external_address`. Follow-up to #3954. Pull-Request: #4052.

divagant-martian

I think docs were missed here, let me know if it's me who is misunderstanding something

divagant-martian · 2023-06-30T02:29:36Z

swarm/src/behaviour.rs

-    ExpiredExternalAddr(ExpiredExternalAddr<'a>),
+    /// Informs the behaviour that we have discovered a new candidate for an external address for us.
+    NewExternalAddrCandidate(NewExternalAddrCandidate<'a>),
+    /// Informs the behaviour that an external address of the local node was removed.


these docs looks wrong to me, or I'm missing something

divagant-martian · 2023-06-30T02:30:42Z

swarm/src/behaviour.rs

 /// [`FromSwarm`] variant that informs the behaviour that an external address was removed.
 #[derive(Clone, Copy)]
-pub struct ExpiredExternalAddr<'a> {
+pub struct ExternalAddrConfirmed<'a> {
+    pub addr: &'a Multiaddr,
+}


Are these docs correct? They seem to belong to the type below?
I ended up in the pr because docs don't seem to relate to the type name

thomaseizinger · 2023-06-30T09:42:47Z

I think docs were missed here, let me know if it's me who is misunderstanding something

You are right, thanks for spotting! I'll see to fix it up later :)

Discovered in #3954 (review). Pull-Request: #4135.

mxinden · 2023-10-20T12:29:13Z

swarm/src/behaviour.rs

-    /// Informs the behaviour that an external address was removed.
-    ExpiredExternalAddr(ExpiredExternalAddr<'a>),
+    /// Informs the behaviour that we have discovered a new candidate for an external address for us.
+    NewExternalAddrCandidate(NewExternalAddrCandidate<'a>),


This and the events below miss their corresponding copies in SwarmEvent, correct @thomaseizinger? Or was it a deliberate choice to not report these events to the user as well?

Related to #4688.

It wasn't a deliberate choice to omit them but I have to admit I also didn't know that we were activel trying to keep these in sync.

Previously, a `NetworkBehaviour` could report an `AddressScore` for an external address. This score was a `u32` and addresses would be ranked amongst those. In reality, an address is either confirmed to be publicly reachable (via a protocol such as AutoNAT) or merely represents a candidate that might be an external address. In a way, addresses are guilty (private) until proven innocent (publicly reachable). When a `NetworkBehaviour` reports an address candidate, we perform address translation on it to potentially correct for ephemeral ports of TCP. These candidates are then injected back into the `NetworkBehaviour`. Protocols such as AutoNAT can use these addresses as a source for probing their NAT status. Once confirmed, they can emit a `ToSwarm::ExternalAddrConfirmed` event which again will be passed to all `NetworkBehaviour`s. This simplified approach will allow us implement Kademlia's client-mode (libp2p#2032) without additional configuration options: As soon as an address is reported as publicly reachable, we can activate server-mode for that connection. Related: libp2p#3877. Related: libp2p#3953. Related: libp2p#2032. Related: libp2p/go-libp2p#2229. Co-authored-by: Max Inden <mail@max-inden.de> Pull-Request: libp2p#3954.

thomaseizinger added 4 commits May 16, 2023 13:40

Initial refactoring to new events

73b48b5

Generate candidates based on address translation

d5f45b5

Use candidates in AutoNAT

ab12bb7

Only keep minimal external address state in swarm

43058e1

thomaseizinger requested a review from mxinden May 16, 2023 12:55

thomaseizinger mentioned this pull request May 16, 2023

feat(kad): implement automatic client mode #3877

Merged

4 tasks

thomaseizinger added 3 commits May 17, 2023 14:14

Update identify docs

5a945a8

Insert and remove external addresses

4bcb4c1

Update docs

7fef679

mxinden reviewed May 18, 2023

View reviewed changes

mxinden mentioned this pull request May 18, 2023

feat(swarm): remove deprecated items #3956

Merged

4 tasks

Apply suggestions from code review

5f0c01d

Co-authored-by: Max Inden <mail@max-inden.de>

mxinden approved these changes May 22, 2023

View reviewed changes

thomaseizinger added 2 commits May 22, 2023 10:21

Fix doc link

0256f8a

Remove private NAT status test

3c38b3f

We no longer test external addresses.

thomaseizinger added 3 commits May 22, 2023 10:31

Fix more doc links

b1edd30

Merge branch 'master' into 2032/external-addr

7efd6ab

Fix compile error with latest master

fc58b54

Delete tests

62ec482

Update docs

1c5d243

thomaseizinger requested a review from mxinden May 22, 2023 18:33

Revert "Delete tests"

ccc9bf3

This reverts commit 62ec482.

thomaseizinger added 2 commits May 23, 2023 12:33

Fix tests by adding a new API to AutoNAT

a4886f5

Rename to probe_address

d7da756

Merge branch 'master' into 2032/external-addr

5d3084a

thomaseizinger added this to the v0.52.0 milestone May 23, 2023

thomaseizinger added the send-it label May 24, 2023

Merge branch 'master' into 2032/external-addr

452c506

mergify bot merged commit 5e8f2e8 into master May 24, 2023

mergify bot deleted the 2032/external-addr branch May 24, 2023 07:52

thomaseizinger mentioned this pull request May 30, 2023

identify: don't report listen addresses as external addresses #4010

Closed

arpankapoor mentioned this pull request Jun 1, 2023

feat(quic): implement hole punching #3964

Merged

4 tasks

mxinden mentioned this pull request Jun 9, 2023

fix(examples): call Swarm::add_external_address in dcutr and relay #4052

Merged

4 tasks

divagant-martian reviewed Jun 30, 2023

View reviewed changes

thomaseizinger mentioned this pull request Jun 30, 2023

docs(swarm): fix stale docs around confirmed addresses #4135

Merged

4 tasks

mergify bot pushed a commit that referenced this pull request Jul 1, 2023

docs(swarm): fix stale docs around confirmed addresses

68e6bf9

Discovered in #3954 (review). Pull-Request: #4135.

b-zee mentioned this pull request Jul 5, 2023

Identify behavior adds observed address, but contains ephemeral port causing AutoNAT probe failure #4153

Closed

thomaseizinger mentioned this pull request Sep 13, 2023

dcutr: consider more than just external addresses #4496

Closed

mxinden reviewed Oct 20, 2023

View reviewed changes

dmitry-markin mentioned this pull request Apr 4, 2024

Upgrade libp2p to 0.52.4 paritytech/polkadot-sdk#1631

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(swarm): replace address scoring with explicit candidates #3954

feat(swarm): replace address scoring with explicit candidates #3954

thomaseizinger commented May 16, 2023 •

edited

Loading

thomaseizinger commented May 17, 2023

mxinden commented May 17, 2023

thomaseizinger commented May 17, 2023

mxinden commented May 18, 2023

mxinden left a comment

mxinden May 18, 2023

thomaseizinger May 18, 2023

mxinden May 22, 2023

thomaseizinger May 22, 2023

thomaseizinger commented May 18, 2023

mxinden left a comment

thomaseizinger commented May 22, 2023

mxinden commented May 22, 2023

thomaseizinger commented May 22, 2023

thomaseizinger commented May 23, 2023

thomaseizinger commented May 24, 2023

divagant-martian left a comment

divagant-martian Jun 30, 2023 •

edited

Loading

divagant-martian Jun 30, 2023

thomaseizinger commented Jun 30, 2023

mxinden Oct 20, 2023

thomaseizinger Oct 21, 2023

feat(swarm): replace address scoring with explicit candidates #3954

feat(swarm): replace address scoring with explicit candidates #3954

Conversation

thomaseizinger commented May 16, 2023 • edited Loading

Description

Notes & open questions

Change checklist

thomaseizinger commented May 17, 2023

mxinden commented May 17, 2023

thomaseizinger commented May 17, 2023

mxinden commented May 18, 2023

mxinden left a comment

Choose a reason for hiding this comment

mxinden May 18, 2023

Choose a reason for hiding this comment

thomaseizinger May 18, 2023

Choose a reason for hiding this comment

mxinden May 22, 2023

Choose a reason for hiding this comment

thomaseizinger May 22, 2023

Choose a reason for hiding this comment

thomaseizinger commented May 18, 2023

mxinden left a comment

Choose a reason for hiding this comment

thomaseizinger commented May 22, 2023

mxinden commented May 22, 2023

thomaseizinger commented May 22, 2023

thomaseizinger commented May 23, 2023

thomaseizinger commented May 24, 2023

divagant-martian left a comment

Choose a reason for hiding this comment

divagant-martian Jun 30, 2023 • edited Loading

Choose a reason for hiding this comment

divagant-martian Jun 30, 2023

Choose a reason for hiding this comment

thomaseizinger commented Jun 30, 2023

mxinden Oct 20, 2023

Choose a reason for hiding this comment

thomaseizinger Oct 21, 2023

Choose a reason for hiding this comment

thomaseizinger commented May 16, 2023 •

edited

Loading

divagant-martian Jun 30, 2023 •

edited

Loading